Activity 08¶

Tyler Barna¶

Contents¶

  1. Import Modules and Data
  2. Visualizing Data
  3. Data
    3.1 Basic Linear Fit
    3.2 MCMC Analysis: No Error
    3.3 MCMC Analysis: With Error

1. Import Modules and Data¶

In [1]:
import csv
import dis
import inspect
import os
import sys
import time

import arviz as az
import astropy
import astroquery
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pprint
import pymc as pm
import random
import scipy as sp
import scipy.stats as stats
import seaborn as sns
import time
import warnings
warnings.filterwarnings('ignore')

from astropy import units as u
from astropy import constants as const
from astropy.coordinates import SkyCoord
from astropy.io import fits
from astropy.io import ascii
from astropy.time import Time
from astropy.time import TimeDelta
from astropy.timeseries import TimeSeries
from astropy.visualization import time_support
time_support()
from astropy.visualization import quantity_support
quantity_support()

from IPython.display import display_html
from IPython.display import Image

from numpy import interp

from pymc import Model, Normal, Gamma, find_MAP

from scipy import integrate
from scipy import linalg as la
from scipy import optimize

from scipy.stats import beta
from scipy.stats import betabinom
from scipy.stats import binom
from scipy.stats import gamma
from scipy.stats import invgamma
from scipy.stats import multivariate_normal as mvn
from scipy.stats import nbinom
from scipy.stats import norm
from scipy.stats import poisson
from scipy.stats import t
from scipy.stats import uniform

from sklearn.linear_model import LinearRegression as linreg
from sklearn import preprocessing as preproc

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
az.style.use("arviz-darkgrid")

pp = pprint.PrettyPrinter(indent=4)

## set seed for reproducibility
random.seed(5731)

## import data
df = pd.read_csv('cepheids.csv')
display(df.describe())
MV MV_err LogP LogP_err
count 203.000000 203.000000 203.000000 203.000000
mean -3.559606 0.096749 0.797635 0.031084
std 0.799984 0.032976 0.274683 0.020820
min -6.080000 0.080000 -0.040000 0.020000
25% -4.030000 0.090000 0.615000 0.020000
50% -3.580000 0.090000 0.780000 0.020000
75% -3.170000 0.100000 0.970000 0.020000
max 0.250000 0.460000 1.620000 0.070000

2. Visualizing Data¶

Let's visualize the data to see what we're working with.

In [2]:
## plot histogram of magnitude with errors
fig, ax = plt.subplots(figsize=(10,6), facecolor='white')
sns.histplot(data=df, x='MV', kde=True, bins=20, ax=ax)
ax.set_title('Histogram of Cepheid Magnitudes')
plt.show();

## plot histogram of log period with errors
fig, ax = plt.subplots(figsize=(10,6), facecolor='white')
sns.histplot(data=df, x='LogP', kde=True, bins=25, ax=ax)
ax.set_title('Histogram of Cepheid Log Periods')
plt.show();
In [3]:
## plot the log period vs magnitude with errors for both
fig, ax = plt.subplots(figsize=(10,8), facecolor='white')
ax.errorbar(df['LogP'], df['MV'], xerr=df['LogP_err'], yerr=df['MV_err'], fmt='o', color='black')
ax.set_xlabel('Log Period')
ax.set_ylabel('Magnitude')
ax.grid(alpha=0.75)
ax.set_title('Cepheid Log Period vs Magnitude')
plt.show();

3. Analysis¶

3.1 Basic Linear Fit¶

Before performing the linear regression with PyMC, let's try the naive approach of fitting a line to the data. We can do this with the np.polyfit function. This function takes the x and y data, and the order of the polynomial to fit. We'll use a first order polynomial, which is a line.

In [4]:
## equation for a line
line = lambda m,x,b: m*x + b

## fit the data with a linear regression with np.polyfit
m, b = np.polyfit(df['LogP'], df['MV'], 1)
estimated_MV = line(m,df['LogP'],b)

## plot the log period vs magnitude with errors for both
fig, ax = plt.subplots(figsize=(10,8), facecolor='white')
ax.errorbar(df['LogP'], df['MV'], xerr=df['LogP_err'], yerr=df['MV_err'], fmt='o', color='black', label='Data')

## now plot the estimated magnitude over top of it
ax.plot(df['LogP'], estimated_MV, color='red', label='Linear Regression')

## styling
ax.set_xlabel('Log Period')
ax.set_ylabel('Magnitude')
ax.grid(alpha=0.75)
ax.set_title('Cepheid Log Period vs Magnitude')
ax.legend()
plt.show();

We can compare this to the results of the mcmc analysis later.

3.2 MCMC Analysis: No Error¶

Now, I'll follow the steps outlined in the introductory file to perform the MCMC analysis. In this case, y will be the MV, and x will be LogP (there is only one x in our case)

In [5]:
x = df['LogP']
x_err = df['LogP_err']
y = df['MV']
y_err = df['MV_err']
basic_model = Model()
with basic_model:
    ## priors for unknown model parameters
    Beta = Normal('beta',mu=0,tau=1./10, shape=2)
    precision = Gamma('precision', alpha=1, beta=1)
    
    mu = Beta[0] + Beta[1]*x
    Y_obs = Normal('Y_obs', mu=mu, tau=precision, observed=y)
In [6]:
map_estimate = find_MAP(model=basic_model)
pp.pprint(map_estimate)
100.00% [19/19 00:00<00:00 logp = -283.42, ||grad|| = 516.17]
{   'beta': array([-2.0061211 , -1.94716412]),
    'precision': array(2.76414659),
    'precision_log__': array(1.01673194)}
In [7]:
# Initialize random number generator
RANDOM_SEED = 5731
rng = np.random.default_rng(RANDOM_SEED)

with basic_model:
    start = find_MAP() 

    # draw 1000 posterior samples
    trace = pm.sample( start=start,return_inferencedata=True)
100.00% [19/19 00:00<00:00 logp = -283.42, ||grad|| = 516.17]

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, precision]
100.00% [8000/8000 00:06<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 7 seconds.
In [8]:
display(trace)
arviz.InferenceData
    • <xarray.Dataset>
      Dimensions:     (chain: 4, draw: 1000, beta_dim_0: 2)
      Coordinates:
        * chain       (chain) int64 0 1 2 3
        * draw        (draw) int64 0 1 2 3 4 5 6 7 ... 992 993 994 995 996 997 998 999
        * beta_dim_0  (beta_dim_0) int64 0 1
      Data variables:
          beta        (chain, draw, beta_dim_0) float64 -1.749 -2.209 ... -1.871
          precision   (chain, draw) float64 2.343 2.476 2.493 ... 3.018 2.402 2.466
      Attributes:
          created_at:                 2022-11-02T11:19:58.856329
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.1
          sampling_time:              6.896633625030518
          tuning_steps:               1000
      xarray.Dataset
        • chain: 4
        • draw: 1000
        • beta_dim_0: 2
        • chain
          (chain)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • draw
          (draw)
          int64
          0 1 2 3 4 5 ... 995 996 997 998 999
          array([  0,   1,   2, ..., 997, 998, 999])
        • beta_dim_0
          (beta_dim_0)
          int64
          0 1
          array([0, 1])
        • beta
          (chain, draw, beta_dim_0)
          float64
          -1.749 -2.209 ... -2.074 -1.871
          array([[[-1.74883204, -2.20934138],
                  [-1.75961077, -2.28254188],
                  [-1.73497135, -2.23189342],
                  ...,
                  [-1.96825518, -1.88497426],
                  [-2.05852092, -1.94740155],
                  [-2.05852092, -1.94740155]],
          
                 [[-2.12378506, -1.85605044],
                  [-2.19052981, -1.71372502],
                  [-2.10905606, -1.84186285],
                  ...,
                  [-1.9787233 , -2.00171986],
                  [-1.73176876, -2.32718585],
                  [-1.66444166, -2.33474834]],
          
                 [[-1.79723632, -2.14298996],
                  [-1.76548296, -2.18891225],
                  [-2.27662437, -1.61649719],
                  ...,
                  [-1.96067392, -1.97682214],
                  [-2.03565224, -1.88729603],
                  [-2.10004586, -1.85517579]],
          
                 [[-1.79607115, -2.21535281],
                  [-1.94587412, -2.02457683],
                  [-1.98874297, -1.95203953],
                  ...,
                  [-2.00394175, -1.95880815],
                  [-2.08694497, -1.86777468],
                  [-2.07425705, -1.87107161]]])
        • precision
          (chain, draw)
          float64
          2.343 2.476 2.493 ... 2.402 2.466
          array([[2.34290642, 2.47631081, 2.49298152, ..., 2.68789082, 2.43405392,
                  2.43405392],
                 [2.6704533 , 2.72249877, 2.77937608, ..., 2.50053447, 2.61263797,
                  2.84012062],
                 [2.4945951 , 2.69782482, 2.76065585, ..., 2.48812783, 3.06493819,
                  2.4724689 ],
                 [2.7976668 , 3.12026179, 2.90092612, ..., 3.01843309, 2.40231403,
                  2.46625525]])
      • created_at :
        2022-11-02T11:19:58.856329
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.1
        sampling_time :
        6.896633625030518
        tuning_steps :
        1000

    • <xarray.Dataset>
      Dimensions:      (chain: 4, draw: 1000, Y_obs_dim_0: 203)
      Coordinates:
        * chain        (chain) int64 0 1 2 3
        * draw         (draw) int64 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999
        * Y_obs_dim_0  (Y_obs_dim_0) int64 0 1 2 3 4 5 6 ... 197 198 199 200 201 202
      Data variables:
          Y_obs        (chain, draw, Y_obs_dim_0) float64 -0.7198 -0.5837 ... -0.993
      Attributes:
          created_at:                 2022-11-02T11:19:59.460390
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.1
      xarray.Dataset
        • chain: 4
        • draw: 1000
        • Y_obs_dim_0: 203
        • chain
          (chain)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • draw
          (draw)
          int64
          0 1 2 3 4 5 ... 995 996 997 998 999
          array([  0,   1,   2, ..., 997, 998, 999])
        • Y_obs_dim_0
          (Y_obs_dim_0)
          int64
          0 1 2 3 4 5 ... 198 199 200 201 202
          array([  0,   1,   2, ..., 200, 201, 202])
        • Y_obs
          (chain, draw, Y_obs_dim_0)
          float64
          -0.7198 -0.5837 ... -0.7361 -0.993
          array([[[ -0.71977808,  -0.58373239,  -0.49464719, ...,  -7.09774412,
                    -0.75875483,  -0.90451448],
                  [ -0.65052634,  -0.61236394,  -0.46648248, ...,  -7.58332692,
                    -0.85290674,  -0.77298066],
                  [ -0.70410168,  -0.5607704 ,  -0.46353196, ...,  -7.43052513,
                    -0.75485575,  -0.88248789],
                  ...,
                  [ -0.64878314,  -0.50911917,  -0.42696062, ...,  -9.08629066,
                    -0.61292773,  -1.16623257],
                  [ -0.57093282,  -0.65789168,  -0.48439375, ...,  -8.95364669,
                    -0.81145603,  -0.88560462],
                  [ -0.57093282,  -0.65789168,  -0.48439375, ...,  -8.95364669,
                    -0.81145603,  -0.88560462]],
          
                 [[ -0.52494035,  -0.62509495,  -0.43936785, ..., -10.08471806,
                    -0.76332566,  -0.94160125],
                  [ -0.52912677,  -0.57827652,  -0.42308643, ..., -10.57727082,
                    -0.66649542,  -1.11567244],
                  [ -0.52685475,  -0.58680181,  -0.41431944, ..., -10.33090267,
                    -0.7180822 ,  -0.99717514],
          ...
                  [ -0.62689396,  -0.58509871,  -0.463433  , ...,  -8.53198822,
                    -0.72382302,  -0.98193742],
                  [ -0.5357825 ,  -0.51584694,  -0.36002218, ..., -10.77021872,
                    -0.66113215,  -1.0526745 ],
                  [ -0.57315034,  -0.62652763,  -0.47215929, ...,  -9.24915061,
                    -0.74721992,  -0.98072009]],
          
                 [[ -0.61624487,  -0.55663799,  -0.4049413 , ...,  -8.6148358 ,
                    -0.79639691,  -0.810072  ],
                  [ -0.54095561,  -0.52459833,  -0.35168709, ..., -10.41573413,
                    -0.72488128,  -0.9231211 ],
                  [ -0.5631664 ,  -0.53708715,  -0.38734247, ...,  -9.97098381,
                    -0.69502798,  -0.99159748],
                  ...,
                  [ -0.53087301,  -0.54375071,  -0.36964038, ..., -10.46663019,
                    -0.71888918,  -0.9523722 ],
                  [ -0.58864326,  -0.63334043,  -0.48570196, ...,  -8.94430677,
                    -0.75292358,  -0.97856346],
                  [ -0.58649062,  -0.61544493,  -0.47118483, ...,  -9.0772316 ,
                    -0.7360798 ,  -0.99295608]]])
      • created_at :
        2022-11-02T11:19:59.460390
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.1

    • <xarray.Dataset>
      Dimensions:              (chain: 4, draw: 1000)
      Coordinates:
        * chain                (chain) int64 0 1 2 3
        * draw                 (draw) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999
      Data variables: (12/16)
          energy_error         (chain, draw) float64 0.1591 0.0002092 ... -0.02435
          process_time_diff    (chain, draw) float64 0.001719 0.002559 ... 0.001123
          step_size_bar        (chain, draw) float64 0.3273 0.3273 ... 0.3308 0.3308
          largest_eigval       (chain, draw) float64 nan nan nan nan ... nan nan nan
          energy               (chain, draw) float64 194.8 192.1 191.7 ... 191.1 190.1
          max_energy_error     (chain, draw) float64 1.334 -0.22 ... 0.1259 -0.02435
          ...                   ...
          diverging            (chain, draw) bool False False False ... False False
          step_size            (chain, draw) float64 0.4004 0.4004 ... 0.3715 0.3715
          acceptance_rate      (chain, draw) float64 0.5756 0.9971 ... 0.9193 0.996
          tree_depth           (chain, draw) int64 3 3 2 4 2 4 4 4 ... 3 3 3 4 3 2 4 3
          perf_counter_start   (chain, draw) float64 5.132e+04 5.132e+04 ... 5.132e+04
          n_steps              (chain, draw) float64 7.0 7.0 3.0 11.0 ... 3.0 11.0 7.0
      Attributes:
          created_at:                 2022-11-02T11:19:58.866146
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.1
          sampling_time:              6.896633625030518
          tuning_steps:               1000
      xarray.Dataset
        • chain: 4
        • draw: 1000
        • chain
          (chain)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • draw
          (draw)
          int64
          0 1 2 3 4 5 ... 995 996 997 998 999
          array([  0,   1,   2, ..., 997, 998, 999])
        • energy_error
          (chain, draw)
          float64
          0.1591 0.0002092 ... -0.02435
          array([[ 1.59074599e-01,  2.09155767e-04, -4.72462779e-02, ...,
                   1.07597224e-01, -6.00345947e-01,  0.00000000e+00],
                 [ 2.01594515e-01, -1.94750268e-01, -4.69665439e-03, ...,
                  -7.38413704e-02,  2.02868971e-01, -2.31727476e-01],
                 [ 4.53801946e-01, -8.32175626e-02, -2.44280373e-01, ...,
                  -2.84394979e-02, -3.07959502e-03, -9.95700017e-03],
                 [-4.89844560e-02, -4.41245128e-02,  8.04851882e-02, ...,
                  -5.18748886e-02,  5.46013417e-02, -2.43496809e-02]])
        • process_time_diff
          (chain, draw)
          float64
          0.001719 0.002559 ... 0.001123
          array([[0.0017191, 0.0025586, 0.0006852, ..., 0.0011932, 0.0015015,
                  0.0007783],
                 [0.0031024, 0.0018505, 0.0018464, ..., 0.0021934, 0.004828 ,
                  0.0014729],
                 [0.002416 , 0.0021675, 0.0031767, ..., 0.0025864, 0.0009939,
                  0.002235 ],
                 [0.0028979, 0.0029006, 0.0014995, ..., 0.0006548, 0.0018088,
                  0.0011226]])
        • step_size_bar
          (chain, draw)
          float64
          0.3273 0.3273 ... 0.3308 0.3308
          array([[0.32726484, 0.32726484, 0.32726484, ..., 0.32726484, 0.32726484,
                  0.32726484],
                 [0.32584279, 0.32584279, 0.32584279, ..., 0.32584279, 0.32584279,
                  0.32584279],
                 [0.31030501, 0.31030501, 0.31030501, ..., 0.31030501, 0.31030501,
                  0.31030501],
                 [0.33082471, 0.33082471, 0.33082471, ..., 0.33082471, 0.33082471,
                  0.33082471]])
        • largest_eigval
          (chain, draw)
          float64
          nan nan nan nan ... nan nan nan nan
          array([[nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan]])
        • energy
          (chain, draw)
          float64
          194.8 192.1 191.7 ... 191.1 190.1
          array([[194.84003611, 192.08046275, 191.72116722, ..., 191.55716682,
                  195.44984124, 192.61398831],
                 [190.01704478, 192.62436138, 190.49715398, ..., 189.26998995,
                  193.27112269, 193.41625813],
                 [191.54281862, 190.9129227 , 192.41880062, ..., 190.41594281,
                  191.09108227, 190.61952111],
                 [191.49860569, 191.74485751, 190.83451047, ..., 189.02368955,
                  191.06500218, 190.09472531]])
        • max_energy_error
          (chain, draw)
          float64
          1.334 -0.22 ... 0.1259 -0.02435
          array([[ 1.33425153, -0.22002416,  0.32935963, ..., -1.13004091,
                   1.92832774,  2.03416223],
                 [ 0.32341996,  1.5515633 , -0.02871881, ..., -0.13381057,
                   0.31570792,  0.2396984 ],
                 [ 1.4294946 , -0.3026361 ,  1.93467482, ...,  0.2138971 ,
                   0.08825633,  0.10400905],
                 [ 0.61074586, -0.04412451,  1.02988784, ..., -0.05187489,
                   0.12588297, -0.02434968]])
        • smallest_eigval
          (chain, draw)
          float64
          nan nan nan nan ... nan nan nan nan
          array([[nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan]])
        • perf_counter_diff
          (chain, draw)
          float64
          0.001862 0.002771 ... 0.001215
          array([[0.00186176, 0.00277059, 0.0007415 , ..., 0.0012908 , 0.00162561,
                  0.0008433 ],
                 [0.00336007, 0.00200115, 0.00199941, ..., 0.00237562, 0.00534766,
                  0.00159239],
                 [0.00261744, 0.00234757, 0.00344057, ..., 0.00280023, 0.00107651,
                  0.00242076],
                 [0.00313927, 0.00314161, 0.00162353, ..., 0.00070879, 0.00195895,
                  0.00121502]])
        • lp
          (chain, draw)
          float64
          -191.6 -191.3 ... -189.6 -189.2
          array([[-191.62569245, -191.30618368, -191.02567956, ..., -190.59442373,
                  -189.9596695 , -189.9596695 ],
                 [-189.17374776, -189.52334371, -188.69396928, ..., -189.07750922,
                  -191.68216491, -191.94075792],
                 [-190.41544352, -190.24798563, -190.68426501, ..., -189.14485308,
                  -189.00595794, -189.34695157],
                 [-189.89945952, -189.1592262 , -188.48702362, ..., -188.69604277,
                  -189.64569426, -189.23339507]])
        • index_in_trajectory
          (chain, draw)
          int64
          -4 -2 2 7 2 5 ... -5 -10 2 1 4 -5
          array([[ -4,  -2,   2, ...,  -2,  -1,   0],
                 [  5,  -2,  -2, ...,  -5,   4,   5],
                 [  4,  -3,   9, ...,   4,  -3, -10],
                 [-14,   3,  -2, ...,   1,   4,  -5]])
        • diverging
          (chain, draw)
          bool
          False False False ... False False
          array([[False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False]])
        • step_size
          (chain, draw)
          float64
          0.4004 0.4004 ... 0.3715 0.3715
          array([[0.40041988, 0.40041988, 0.40041988, ..., 0.40041988, 0.40041988,
                  0.40041988],
                 [0.38928435, 0.38928435, 0.38928435, ..., 0.38928435, 0.38928435,
                  0.38928435],
                 [0.35476712, 0.35476712, 0.35476712, ..., 0.35476712, 0.35476712,
                  0.35476712],
                 [0.37146061, 0.37146061, 0.37146061, ..., 0.37146061, 0.37146061,
                  0.37146061]])
        • acceptance_rate
          (chain, draw)
          float64
          0.5756 0.9971 ... 0.9193 0.996
          array([[0.57562105, 0.99708867, 0.82455727, ..., 0.98542703, 0.79109306,
                  0.41303887],
                 [0.85035119, 0.56346394, 0.99684314, ..., 1.        , 0.88643566,
                  0.91731339],
                 [0.51688342, 0.98415752, 0.52058266, ..., 0.94050677, 0.98166224,
                  0.96874545],
                 [0.78726086, 0.9985298 , 0.61997983, ..., 1.        , 0.91930953,
                  0.99604213]])
        • tree_depth
          (chain, draw)
          int64
          3 3 2 4 2 4 4 4 ... 3 3 3 4 3 2 4 3
          array([[3, 3, 2, ..., 3, 3, 2],
                 [4, 3, 3, ..., 3, 4, 3],
                 [4, 4, 4, ..., 4, 3, 4],
                 [4, 4, 3, ..., 2, 4, 3]])
        • perf_counter_start
          (chain, draw)
          float64
          5.132e+04 5.132e+04 ... 5.132e+04
          array([[51318.73308198, 51318.73517941, 51318.73822203, ...,
                  51321.66596179, 51321.66745554, 51321.66931058],
                 [51318.6763486 , 51318.68003234, 51318.68251326, ...,
                  51321.48270537, 51321.48551416, 51321.49108628],
                 [51319.10137048, 51319.10423513, 51319.10678531, ...,
                  51322.0712899 , 51322.07436521, 51322.07560642],
                 [51318.69131403, 51318.69470517, 51318.69807386, ...,
                  51321.60561752, 51321.60649353, 51321.60866581]])
        • n_steps
          (chain, draw)
          float64
          7.0 7.0 3.0 11.0 ... 3.0 11.0 7.0
          array([[ 7.,  7.,  3., ...,  7.,  7.,  3.],
                 [15.,  7.,  7., ...,  7., 15.,  7.],
                 [15., 15., 15., ..., 15.,  7., 15.],
                 [15., 15.,  7., ...,  3., 11.,  7.]])
      • created_at :
        2022-11-02T11:19:58.866146
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.1
        sampling_time :
        6.896633625030518
        tuning_steps :
        1000

    • <xarray.Dataset>
      Dimensions:      (Y_obs_dim_0: 203)
      Coordinates:
        * Y_obs_dim_0  (Y_obs_dim_0) int64 0 1 2 3 4 5 6 ... 197 198 199 200 201 202
      Data variables:
          Y_obs        (Y_obs_dim_0) float64 -3.47 -3.15 -3.33 ... 0.25 -3.46 -4.86
      Attributes:
          created_at:                 2022-11-02T11:19:59.462411
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.1
      xarray.Dataset
        • Y_obs_dim_0: 203
        • Y_obs_dim_0
          (Y_obs_dim_0)
          int64
          0 1 2 3 4 5 ... 198 199 200 201 202
          array([  0,   1,   2, ..., 200, 201, 202])
        • Y_obs
          (Y_obs_dim_0)
          float64
          -3.47 -3.15 -3.33 ... -3.46 -4.86
          array([-3.47, -3.15, -3.33, -5.41, -3.7 , -3.66, -3.76, -3.86, -3.99,
                 -3.41, -4.35, -3.07, -3.51, -4.  , -3.35, -4.58, -3.06, -4.52,
                 -2.69, -4.02, -3.28, -3.34, -3.51, -4.13, -4.6 , -3.23, -3.3 ,
                 -4.27, -2.86, -3.21, -1.58, -3.36, -3.8 , -3.93, -3.85, -2.78,
                 -3.8 , -3.96, -3.69, -2.97, -3.92, -4.2 , -3.26, -4.13, -4.26,
                 -3.53, -4.09, -4.23, -3.89, -5.45, -2.76, -3.72, -2.28, -4.25,
                 -4.34, -4.47, -4.29, -3.59, -4.49, -3.2 , -3.96, -3.67, -2.99,
                 -3.18, -3.14, -3.53, -3.35, -4.24, -4.82, -4.38, -4.45, -5.79,
                 -4.58, -2.69, -4.12, -4.43, -6.08, -4.44, -4.22, -4.56, -3.41,
                 -3.58, -3.87, -4.02, -3.93, -3.46, -3.31, -3.69, -3.85, -4.09,
                 -3.35, -2.63, -3.38, -4.73, -2.26, -3.27, -3.64, -3.46, -3.75,
                 -3.39, -3.67, -3.67, -4.06, -4.05, -2.77, -3.88, -3.78, -3.65,
                 -3.63, -4.04, -2.84, -4.62, -4.38, -2.99, -3.88, -3.3 , -3.61,
                 -4.07, -3.41, -4.09, -3.95, -5.1 , -3.86, -4.18, -3.35, -4.17,
                 -3.45, -2.85, -3.33, -5.  , -4.07, -3.87, -3.04, -3.46, -2.78,
                 -2.26, -3.63, -2.43, -2.93, -3.14, -3.52, -3.58, -2.91, -3.65,
                 -2.57, -2.82, -3.36, -2.57, -3.73, -3.12, -3.45, -3.46, -3.63,
                 -2.43, -3.45, -2.89, -3.61, -2.55, -3.21, -2.45, -2.6 , -2.79,
                 -2.56, -2.67, -0.76, -3.38, -3.57, -3.13, -4.39, -3.83, -5.05,
                 -3.07, -4.08, -3.71, -3.16, -2.76, -3.34, -2.21, -3.61, -1.93,
                 -3.21, -2.89, -0.24, -3.46, -3.54, -2.94, -3.91, -3.19, -4.65,
                 -3.47, -4.51, -2.87, -3.69, -3.53, -4.16, -3.68, -3.21, -2.95,
                 -3.59, -3.66,  0.25, -3.46, -4.86])
      • created_at :
        2022-11-02T11:19:59.462411
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.1

In [9]:
posterior = trace.posterior
display(posterior)
<xarray.Dataset>
Dimensions:     (chain: 4, draw: 1000, beta_dim_0: 2)
Coordinates:
  * chain       (chain) int64 0 1 2 3
  * draw        (draw) int64 0 1 2 3 4 5 6 7 ... 992 993 994 995 996 997 998 999
  * beta_dim_0  (beta_dim_0) int64 0 1
Data variables:
    beta        (chain, draw, beta_dim_0) float64 -1.749 -2.209 ... -1.871
    precision   (chain, draw) float64 2.343 2.476 2.493 ... 3.018 2.402 2.466
Attributes:
    created_at:                 2022-11-02T11:19:58.856329
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.2.1
    sampling_time:              6.896633625030518
    tuning_steps:               1000
xarray.Dataset
    • chain: 4
    • draw: 1000
    • beta_dim_0: 2
    • chain
      (chain)
      int64
      0 1 2 3
      array([0, 1, 2, 3])
    • draw
      (draw)
      int64
      0 1 2 3 4 5 ... 995 996 997 998 999
      array([  0,   1,   2, ..., 997, 998, 999])
    • beta_dim_0
      (beta_dim_0)
      int64
      0 1
      array([0, 1])
    • beta
      (chain, draw, beta_dim_0)
      float64
      -1.749 -2.209 ... -2.074 -1.871
      array([[[-1.74883204, -2.20934138],
              [-1.75961077, -2.28254188],
              [-1.73497135, -2.23189342],
              ...,
              [-1.96825518, -1.88497426],
              [-2.05852092, -1.94740155],
              [-2.05852092, -1.94740155]],
      
             [[-2.12378506, -1.85605044],
              [-2.19052981, -1.71372502],
              [-2.10905606, -1.84186285],
              ...,
              [-1.9787233 , -2.00171986],
              [-1.73176876, -2.32718585],
              [-1.66444166, -2.33474834]],
      
             [[-1.79723632, -2.14298996],
              [-1.76548296, -2.18891225],
              [-2.27662437, -1.61649719],
              ...,
              [-1.96067392, -1.97682214],
              [-2.03565224, -1.88729603],
              [-2.10004586, -1.85517579]],
      
             [[-1.79607115, -2.21535281],
              [-1.94587412, -2.02457683],
              [-1.98874297, -1.95203953],
              ...,
              [-2.00394175, -1.95880815],
              [-2.08694497, -1.86777468],
              [-2.07425705, -1.87107161]]])
    • precision
      (chain, draw)
      float64
      2.343 2.476 2.493 ... 2.402 2.466
      array([[2.34290642, 2.47631081, 2.49298152, ..., 2.68789082, 2.43405392,
              2.43405392],
             [2.6704533 , 2.72249877, 2.77937608, ..., 2.50053447, 2.61263797,
              2.84012062],
             [2.4945951 , 2.69782482, 2.76065585, ..., 2.48812783, 3.06493819,
              2.4724689 ],
             [2.7976668 , 3.12026179, 2.90092612, ..., 3.01843309, 2.40231403,
              2.46625525]])
  • created_at :
    2022-11-02T11:19:58.856329
    arviz_version :
    0.12.1
    inference_library :
    pymc
    inference_library_version :
    4.2.1
    sampling_time :
    6.896633625030518
    tuning_steps :
    1000
Posterior Analysis¶
In [10]:
ts = az.summary(trace)
display(az.summary(trace, round_to=3))
az.plot_trace(trace,figsize=(10, 9));
plt.show()

az.plot_posterior(trace, figsize=(10, 4));
plt.show();

az.plot_autocorr(trace, var_names=['beta'], filter_vars="like",  max_lag=200,combined=True,figsize=(10, 4))
plt.show();
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
beta[0] -2.002 0.130 -2.229 -1.747 0.003 0.002 1691.145 1682.825 1.001
beta[1] -1.953 0.153 -2.242 -1.678 0.004 0.003 1675.995 1619.767 1.002
precision 2.773 0.276 2.272 3.315 0.006 0.004 2194.811 1890.977 1.002
Posterior Predictive Sampling¶
In [11]:
with basic_model:
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)

display(trace.posterior_predictive)

az.plot_ppc(trace, num_pp_samples=100);
Sampling: [Y_obs]
100.00% [4000/4000 00:00<00:00]
<xarray.Dataset>
Dimensions:      (chain: 4, draw: 1000, Y_obs_dim_0: 203)
Coordinates:
  * chain        (chain) int64 0 1 2 3
  * draw         (draw) int64 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999
  * Y_obs_dim_0  (Y_obs_dim_0) int64 0 1 2 3 4 5 6 ... 197 198 199 200 201 202
Data variables:
    Y_obs        (chain, draw, Y_obs_dim_0) float64 -2.829 -3.34 ... -4.086
Attributes:
    created_at:                 2022-11-02T11:20:08.928194
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.2.1
xarray.Dataset
    • chain: 4
    • draw: 1000
    • Y_obs_dim_0: 203
    • chain
      (chain)
      int64
      0 1 2 3
      array([0, 1, 2, 3])
    • draw
      (draw)
      int64
      0 1 2 3 4 5 ... 995 996 997 998 999
      array([  0,   1,   2, ..., 997, 998, 999])
    • Y_obs_dim_0
      (Y_obs_dim_0)
      int64
      0 1 2 3 4 5 ... 198 199 200 201 202
      array([  0,   1,   2, ..., 200, 201, 202])
    • Y_obs
      (chain, draw, Y_obs_dim_0)
      float64
      -2.829 -3.34 ... -5.268 -4.086
      array([[[-2.82943325, -3.34045539, -5.04655808, ..., -1.41630188,
               -3.09071029, -3.55570303],
              [-3.03260276, -3.72108725, -2.9477874 , ..., -2.99766581,
               -3.74352765, -4.61105316],
              [-2.31757915, -2.81895986, -3.92909657, ..., -2.05428726,
               -3.65284889, -3.22054445],
              ...,
              [-2.3289842 , -4.18051307, -3.4965766 , ..., -2.56005395,
               -3.39842866, -3.36893744],
              [-3.65984436, -4.15079676, -3.59711691, ..., -3.47811414,
               -2.90125992, -4.88749816],
              [-1.50313234, -2.48705552, -2.18230086, ..., -2.50208994,
               -5.08423336, -4.55386843]],
      
             [[-3.30232254, -3.57770667, -4.61632074, ..., -2.44995747,
               -3.98001465, -4.92185008],
              [-3.43603823, -3.30884777, -5.23905776, ..., -2.9396893 ,
               -5.07124494, -4.17932939],
              [-2.39791666, -4.37887956, -3.20039804, ..., -3.96322895,
               -4.27136287, -4.76891517],
      ...
              [-3.24442841, -3.30801381, -3.85902436, ..., -1.53529598,
               -3.48410676, -5.44831596],
              [-2.85200134, -3.76551215, -3.51379925, ..., -1.88077196,
               -3.23809869, -3.73074811],
              [-3.57611356, -2.65693578, -3.03223085, ..., -1.522996  ,
               -3.16461063, -4.35566831]],
      
             [[-2.45500913, -3.53364847, -3.90432425, ..., -1.7959438 ,
               -4.32559844, -4.24017566],
              [-3.48245376, -2.62175806, -3.65175264, ..., -3.33525737,
               -4.37977103, -3.78709527],
              [-2.70905027, -2.20653481, -2.77765051, ..., -1.85473669,
               -4.57165575, -4.47516146],
              ...,
              [-3.63557464, -4.79278484, -3.07973707, ..., -2.44974107,
               -4.7837075 , -4.48592159],
              [-2.87901164, -4.03420312, -2.35681939, ..., -2.57984218,
               -2.84606416, -4.56961253],
              [-2.14642611, -3.36882346, -2.87411854, ..., -2.47603054,
               -5.26750285, -4.08596416]]])
  • created_at :
    2022-11-02T11:20:08.928194
    arviz_version :
    0.12.1
    inference_library :
    pymc
    inference_library_version :
    4.2.1

Let's compare the posterior predictive sampling to the data.

In [12]:
## equation for a line
line = lambda m,x,b: m*x + b

sortedP = np.linspace(df['LogP'].min()*2,df['LogP'].max()*2,1000)
## fit the data with a linear regression with np.polyfit
m, b = np.polyfit(df['LogP'], df['MV'], 1) ## not sorted here
estimated_MV = line(m,sortedP,b)

## estimate fit with posterior 
m_mean = line(ts['mean'][0],sortedP,ts['mean'][1])
m_03 = line(ts['hdi_3%'][0],sortedP,ts['hdi_3%'][1])
m_97 = line(ts['hdi_97%'][0],sortedP,ts['hdi_97%'][1])

## plot the log period vs magnitude with errors for both
fig, ax = plt.subplots(figsize=(10,8), facecolor='white')
ax.errorbar(df['LogP'], df['MV'], xerr=df['LogP_err'], yerr=df['MV_err'], 
            fmt='o', markersize=3, color='black', alpha=0.9, label='Data')

## plot the 97% confidence interval
ax.fill_between(sortedP, m_03, m_97, color='grey', alpha=0.5, label='Posterior 97% CI')

## now plot the estimated magnitude over top of it
ax.plot(sortedP, estimated_MV, 
        lw=6, color='teal', label='Linear Regression')

## plot the mean of the posterior distribution

ax.plot(sortedP, m_mean, color='aqua', label='Posterior Mean')



## styling
# ax.set_xlim(df['LogP'].min()*1.1,df['LogP'].max()*1.1)
ax.set_xlim(-0.07,1.75)
ax.set_xlabel('Log Period')
ax.set_ylabel('Magnitude')
ax.grid(alpha=0.5)
ax.set_title('Cepheid Log Period vs Magnitude')
ax.legend()
plt.show();

There's a greater difference here compared to the analysis from last week; we can see the basic method underestimates for low values and overestimates for high values

3.3 MCMC Analysis: With Error¶

In [ ]: